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Abstract 

Background: Lung adenocarcinoma is a highly heterogeneous disease with various etiologies, prognoses, and 
responses to therapy. Although genome-scale characterization of lung adenocarcinoma has been performed, a 
comprehensive somatic mutation analysis of EGFR/KRAS/ALK-negatwe lung adenocarcinoma in never-smokers has 
not been conducted. 

Methods: We analyzed whole exome sequencing data from 16 EGFR/KRAS/ALK-negatwe lung adenocarcinomas 
and additional 54 tumors in two expansion cohort sets. Candidate loci were validated by target capture and Sanger 
sequencing. Gene set analysis was performed using Ingenuity Pathway Analysis. 

Results: We identified 27 genes potentially implicated in the pathogenesis of lung adenocarcinoma. These 
included targetable genes involved in PBK/mTOR signaling {TSC1, PIK3CA, AKJ2) and receptor tyrosine kinase 
signaling {ERBB4) and genes not previously highlighted in lung adenocarcinomas, such as SETD2 and PBRM1 
(chromatin remodeling), CHEK2 and CDC27 (cell cycle), CUL3 and S0D2 (oxidative stress), and CSMD3 and TEG 
(immune response). In the expansion cohort (N = 70), TP53 was the most frequently altered gene (1 1%), followed by 
SETD2 (6%), CSMD3 (6%), ERBB2 (6%), and CDH10 (4%). In pathway analysis, the majority of altered genes were 
involved in cell cycle/DNA repair {P <0.001) and cAMP-dependent protein kinase signaling [P <0.001). 

Conclusions: The genomic makeup of EGFR/KRAS/ALK-negatwe lung adenocarcinomas in never-smokers is remarkably 
diverse. Genes involved in cell cycle regulation/DNA repair are implicated in tumorigenesis and represent potential 
therapeutic targets. 



Background 

Lung cancer is the leading cause of cancer deaths world- 
wide [1]. In 2008, 1.38 million deaths were attributed to 
lung cancer, accounting for approximately 20% of 
cancer-related deaths. Lung cancer is a highly heteroge- 
neous disease with regard to its etiology, prognosis, and 



* Correspondence: cbc1971@yuhs.ac; JIHYNI@yuhs.ac; duheebang@yonsei.ac.kr 
+ Equal contributors 

3 Yonsei Cancer Center, Division of Medical Oncology, Department of Internal 
Medicine, Yonsei University College of Medicine, Seoul 120-752, Korea 
"Department of Oral Biology, College of Dentistry, Yonsei University, Seoul, 
Republic of Korea 

department of Chemistry, Yonsei University, Seoul 120-752, Korea 
Full list of author information is available at the end of the article 



response to therapy, complicating both prevention and 
treatment [1]. Non-small cell lung cancer accounts for 
approximately 85% of newly diagnosed lung cancers and 
can be classified into two major histologic subtypes: 
adenocarcinoma (approximately 50% of cases) and squa- 
mous cell carcinoma (approximately 30%). Although the 
majority of lung cancer cases are attributed to tobacco 
smoke, approximately 25% of lung cancer cases world- 
wide occur in never-smokers. Lung cancers in never- 
smokers have distinct clinicopathologic characteristics 
and clinical outcomes [2,3]. 

The discovery of driver mutations, such as epidermal 
growth factor receptor (EGFR) and anaplastic lymphoma 
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kinase (ALK), has led to remarkable improvements in 
personalized therapies for lung adenocarcinoma [4]. 
For example, erlotinib and gefitinib has been particu- 
larly efficacious in patients with EGFR mutations, and 
crizotinib in patients with ALK fusions [5,6]. Actionable 
genetic alterations that are treatable with therapeutic 
agents have been identified in approximately 50% of 
lung adenocarcinomas and include mutations in EGFR, 
ERBB2, KRAS, ALK, BRAF, PIK3CA, AKT1, ROS1, NRAS, 
and MAP2K1 [4]. Therefore, identification of novel drug- 
gable targets in the remaining 50% of lung adenocarcin- 
omas is a top research priority. 

Comprehensive genomic approaches are being under- 
taken to accelerate the identification of new molecular 
targets and increase our understanding of the critical 
cellular and molecular mechanisms underlying lung 
cancer [7-13]. The first comprehensive mutation profil- 
ing of 623 genes in 188 adenocarcinomas identified 26 
significantly mutated genes, including known onco- 
genes (KRAS, EGFR, ERBB2, ERBB4, EPHA3 and other 
ephrin receptor genes, KDR, and FGFR4) and tumor 
suppressor genes (TPS3, CDKN2A, STK11, NF1, ATM, 
RBI, and APC) [7]. Recent studies using next-generation 
sequencing identified new candidate driver mutations, 
including RET1 rearrangements [8,14] and mutations in 
CSMD3 [9], MXRAS [10], U2AF1, RBM10, and ARID1A 
[11], and DACH1, CFTR, RELN, ABCBS, and HGF [12]. 
Previous studies focused primarily on the identification of 
new driver genes according to mutation frequency and 
pattern; systematic pathway-based analysis has not been 
performed. Moreover, sample sizes in previous studies 
were insufficient to resolve rare driver mutations in lung 
adenocarcinoma. 

In this study, we analyzed exome sequencing data 
from 16 EGFR/KRAS/ALK-neg&tive tumors and paired 
normal samples and an applicable expansion cohort of 
54 EGFR /KRAS -negative lung adenocarcinomas to iden- 
tify novel somatic mutations in lung adenocarcinomas of 
never-smokers. 



Methods 

Preparation of clinical samples 

Tumor and adjacent normal lung fresh tissues were ob- 
tained by surgical procedures. Clinical information in- 
cluding age, sex, smoking history, tumor histology, and 
tumor stage based on the seventh edition of the American 
Joint Committee on Cancer staging system was col- 
lected. Never-smokers were defined as patients who 
had a lifelong history of smoking fewer than 100 ciga- 
rettes. All patients provided informed consent. The 
study was approved by the Institutional Review Board 
of Severance Hospital (4-2011-0891) and conducted in 
accordance with the Helsinki Declaration [15]. 



EGFR/KRAS mutations were verified by Sanger se- 
quencing and ALK rearrangement was detected by 
hybridization probes. To screen for EGFR and KRAS 
mutations by Sanger sequencing, we used the following 
primer sequences: 5 -CAGATGTTATCGAGGTCCGA-3 ' 
and 5 -CAAGCAGAAGACGGCATACG-3 ' to detect de- 
letions in exon 19 of EGFR, 5 -CAAGCAGAAGACGG 
CATACG-3' and 5 -GACCACCGAGATCTACACTC-3 ' 
to detect the L858R mutation in exon 21 of EGFR, and 5'- 
GTG ACATGTTCTAATATAGTCAC-3 ' and 5 -TCTATT 
GTTGG ATCATATTCGTC-3 ' to detect mutations in 
codons 12 and 13 of KRAS. ALK rearrangements were 
verified using break-apart fluorescence in situ hybridization 
probes (Vysis LSI ALK Dual Color, Break Apart Rearrange- 
ment Probe; Abbott Molecular, Abbott Park, IL, USA). 
DNA was extracted from tissues using the DNeasy Blood 
& Tissue Kit (Qiagen, Valencia, CA, USA). 

Whole exome sequencing 

Extracted DNA was sheared and a genomic library prepared 
using the NEBNext kit (New England BioLabs, Inc., Ipswich, 
MA, USA) according to the manufacturers instructions. 
Exon sequences were captured using the TruSeq Exome 
Enrichment Kit (Illumina, San Diego, CA, USA). Whole 
exome sequencing was performed using the Illumina 
HiSeq2000 platform. Sequencing data are accessible at 
Sequence Read Archive ([16] accession number [SRA: 
SRP022932]). 

Exome data analyses 

The analysis flow chart is illustrated in Additional file 1: 
Figure SI. The references for software packages used for 
exome data analyses are summarized in Additional file 1. 
Briefly, all sequenced reads were aligned to the human 
reference genome National Center for Biotechnology 
Information build 37 (hgl9) using Novoalign. Local re- 
aligning around indels and pair-end fixing was per- 
formed by GATK (version 1.4-21) and Picard, and PCR 
duplicates were removed using Picard. Quality scores 
were recalibrated by GATK. 

Three variant calling programs were used to call single 
nucleotide variants: muTect (1.0.287783), VarScan (ver- 
sion 2.2.11), and GATK Unified Genotyper. Indels were 
called by the GATK Somatic Indel Detector with default 
parameters in paired sample mode. 

Mutated loci were annotated using ANNOVAR and 
Polyphen2. Only non-synonymous single nucleotide var- 
iants and indels in coding exons and splicing sites were 
included. Known single nucleotide polymorphisms with 
minor allelic frequency >5% in the 1000 Genome Project 
Phase I East Asian (2012 April) and NHLBI Exome 
Sequencing Project 6500 (2012 Oct) were annotated and 
removed by ANNOVAR. 
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Validation by molecular-inversion probe capture 

The molecular-inversion probe (MIP) capture method 
was used to validate 1,401 candidate loci identified in 
whole exome sequencing. We designed 3,726 probes to 
capture the candidate loci (Additional file 2: Table S7). 
The microarray-based MIP preparation and capture ex- 
periment followed MIP standard-operating procedures 
with modifications in the preparation of probes (manu- 
script in preparation) [17]. 

For MIP probe hybridization, 1 ug of genomic DNA, 
1.5 ul of Ampligase buffer (Epicentre, Madison, WI, 
USA), 1 ul of probe mixture (genomic DNA to probe ra- 
tio, 1:90), and distilled H 2 0 were combined to give a 
total volume of 15 ul. The reaction was carried out for 
5 minutes at 95°C and then the temperature was de- 
creased to 60°C at a rate of 0.1 °C per second followed by 
incubation for 24 hours at 60°C. After addition of 0.2 ul 



of Phusion polymerase (New England BioLabs Inc.), 1 ul 
of deoxyribonucleotide triphosphate (New England Bio- 
Labs, Inc.), 0.2 ul of Ampligase buffer (Epicentre), 4 
units of Ampligase DNA ligase (Epicentre), and 0.3 ul of 
distilled H 2 0, the mixture was incubated for an add- 
itional 24 hours. 

PCR products were purified with a Qiagen gel extraction 
kit, mixed equally based on concentrations determined 
using a Qubit 2.0 fluorometer (Invitrogen, Carlsbad, CA, 
USA), and sent for Illumina HiSeq2000 sequencing. 

Molecular-inversion probe capture data analysis 

The raw data were aligned to the human reference gen- 
ome (hgl9) by Novoalign. Aligned data on the position 
of candidate loci were selected and transformed to 
pileup format by SAMtools. The candidate loci were de- 
fined as validated loci if the variant base was the same as 
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Figure 1 Mutation summary of 16 EGFR/KRAS/ALK-negative lung adenocarcinomas. 
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that of whole exome sequencing and the following cri- 
teria were satisfied: variant allele frequency in tumor 
>5%, reads supporting variant allele in normal <2. 

Validation by Sanger sequencing 

Candidate driver mutations and randomly selected 
validated loci were chosen for additional validation and 
appropriate primer pairs for Sanger sequencing were de- 
signed (Additional file 2: Table S8). PCR products were 
purified and sent for Sanger sequencing (Macrogen, Seoul, 
Korea). Sequencing data were analyzed with SeqMan 
(DNASTAR, Madison, WI, USA). 

Canonical pathway analysis 

The pathway analysis was performed through the use of 
Ingenuity Pathway Analysis (Ingenuity® Systems [18]). 
Pathways associated with a set of focus genes were iden- 
tified from the Ingenuity Pathways Analysis library of ca- 
nonical pathways. The P-value was measured to decide 
the likelihood that the association between focus genes 
and a given pathway was due to random chance. The 
more focus genes involved, the more likely the associ- 
ation is not due to random chance, and thus the more 
significant the P-value. A right-tailed Fishers exact test 
was used to calculate a P-value determining the prob- 
ability of an association between the focus genes and the 
canonical pathway. 

Statistical analyses 

Continuous clinical data (for example, age) were com- 
pared using independent Student s £-tests. Categorical data 
(for example, sex, ethnicity, stage, mutation frequency) 
were compared using a chi-squared test. The false- 
discovery rate was corrected for multiple comparisons 
using the method of Benjamini and Hochberg. All statis- 
tical tests were two-tailed, and a P-value <0.05 was consid- 
ered statistically significant. Data analyses were performed 
using R statistical software version 2.15.3 [19]. 

Results 

Exome sequencing of EGFR/KRAS/ALK-negat'we tumors in 
never-smokers 

We screened 230 surgically resected lung adenocarcin- 
oma samples to identify EGFR/KRAS/ALK-negative tu- 
mors in never-smokers. A total of 16 tumors (7% of all 
non-small cell lung cancers) were eligible for exome se- 
quencing. Tumor and normal samples had an average 
sequencing depth of 51.9x and 52.0 x respectively with 
average coverage of 94.6% for each (Additional file 1: 
Table SI). Somatic variants were validated by target- 
capture sequencing (154-fold depth; average coverage of 
94%) and Sanger sequencing, which had a concordance 
rate of 94% with exome sequencing (Additional file 1: 
Tables S2 and S3). We detected a median number of 10 



non- synonymous mutations and indels per tumor (range 
3 to 27; Additional file 1: Table S4). The median rate of 
non- synonymous mutation was 0.32 mutations per 
megabase, which was comparable to that in previous re- 
ports for never-smokers [11,12]. The average ratio of 
transitions to transversions was 1.95; G:C — > A:T transi- 
tions (37%) were the most frequent followed by A:T — > 
G:C transitions (21%), consistent with a previous lung 
cancer exome study [11]. Validated loci were further an- 
alyzed for functional prediction of amino acid changes 
using four different prediction algorithms (SIFT, Poly- 
pnea, LRT, and Mutation Taster) (Additional file 1). 

Non-synonymous somatic mutations in 16 EGFRI 
KRASIALK-negzXive tumors are summarized in Figure 1 
(full mutation information is provided in Additional file 1: 
Table S5). Overall, 14 of 16 patients (87%) had a putative 
non-synonymous mutation. Although EGFRIKRASIALK- 
negative tumors harbored heterogeneous mutation pro- 
files, somatically altered genes were functionally classified 
as follows: major mitogenic and targetable pathways such 
as PI3K/mTOR signaling (TSC1, PIK3CA, AKT2), receptor 
tyrosine kinase signaling (ERBB4), protein tyrosine phos- 
phatase (PTPRC), cell cycle (CHEK2, CDC27), and DNA 
repair (PARP4); tumor suppressor pathways including 
chromatin remodeling (SETD2, PBRM1, MBD2, MECP2), 
Wnt signaling (CTNNB1, TGFBR2), and NF-kB signaling 
{TFG); oxidative stress response (CUL3, SOD2) and differ- 
entiation (SYNE2, NDRG1) (similar to lung squamous 
cell carcinoma [20]); pathways not previously highlighted 



Table 1 Patient characteristics (N = 70) 


Characteristic 


This study 


Expansion cohort 3 


TCGA-LUAD 3 


Number of patients 


16 


40 


14 


Age (years) 








Median 


58.5 


63 


69 


Range 


47 to 79 


36 to 87 


52 to 79 


Sex (%) 








Male 


1 (6) 


7(17) 


2(14) 


Female 


15 (94) 


29 (73) 


12 (86) 


n/a 


0 


4(10) 


0 


Clinical stage (%) 








1 


13 (81) 


24 (60) 


9(65) 


II 


0 


6(15) 


2(14) 


III 


3(19) 


6(15) 


3(21) 


IV 


0 


1 (2) 


0 


n/a 


0 


3(8) 


0 


Ethnic group (%) 








Caucasian 


0 


16 (40) 


14(100) 


Asian 


16 (100) 


24 (60) 


0 



a Never-smokers with EGFR/KRAS-negat\ve adenocarcinoma. Mb, megabase; 
n/a, not available; SD, standard deviation; TCGA-LUAD, The Cancer Genome 
Atlas - Lung adenocarcinoma. 
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in carcinogenesis such as gamma- aminobutyric acid re- 
ceptor signaling (GABRD, GABRG1) and immune re- 
sponse (CSMD3, SYK); as well as YTHDF1 (suggested role 
in RNA binding) and PCDHB14 (role in cell adhesion). 

Investigation of functional domains in altered genes 

Of 32 loci shown in Figure 1, seven were found in the 
Catalogue of Somatic Mutations in Cancer (COSMIC 
v.65): ERBB4 V840I, PIK3CA G118D, CTNNB1 S37C, 
TGFBR2 R504W, MECP2 G273V, CDC27 A273G, and 
PARP4 I1039T. In addition, PIK3CA G118D (in squa- 
mous cell carcinoma), CTNNB1 S37C (in adenocarcin- 
oma), and MECP2 G273V (in adenocarcinoma) were 
previously reported in cases of lung cancer. 

To explore the functional effects of somatic variants, we 
investigated functional domains in altered loci. Seventeen 
loci of 27 genes shown in Figure 1 were located in func- 
tional domains, including kinase domains (ERBB4 (tyrosine 
kinase domain), AKT2 (serine/threonine kinase domain), 
TGFBR2 (tyrosine kinase domain), SYK (protein kinase 
domain)) or domains involved in oncogenic kinase 



activation (TSC1 (TPR/MLPl/MLP2-like protein)); in his- 
tone modification domains (SETD2 (SET domain), PBRM1 
(bromo domain), MBD2 (methyl CpG binding domain), 
MECP2 (methyl CpG binding domain)); in oxidative stress 
response/differentiation domains (CUL3 (cullin domain), 
SOD2 (superoxide dismutase domain), NDRG1 (Ndr family 
domain)); in ion-channel domains (GABRD (ion-channel 
binding domain), GABRG1 (ion-channel transmembrane 
domain)); or were for cell cycle checkpoint proteins 
(CHEK2 (fork head associated domain)). 

Somatic mutations in an expansion cohort of 
EGFR/KRAS-negat\ve tumors in never-smokers 

To validate and expand our mutation analysis in lung 
adenocarcinoma in never-smokers, we collected an ex- 
pansion dataset from five available lung adenocarcinoma 
studies [9-13] and a The Cancer Genome Atlas lung car- 
cinoma study [21], with no overlap with the study of 
Imielinski et al [11]. Clinical information of all patients 
including sex, age, tumor stage, and ethnicity is given 
in Table 1. A total of 54 EGFR/KR AS -negative tumors 
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Signaling (cAMP-mediated signaling) 
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Figure 2 Distribution and pathway analysis of somatic mutations. (A) Venn diagram plot comparing somatically altered genes among 
our cohort (N = 16), the expansion cohort (N = 40), and TCGA-LUAD cohort (N = 14). (B) Gene profiles across EGFR/KRAS-negatwe tumors from 
never-smokers. (C) Pathway analysis of altered genes in EGFR/KRAS-negatwe lung adenocarcinoma from never smokers (N = 70). The most 
significant functions are shown. TCGA-LUAD, The Cancer Genome Atlas - Lung adenocarcinoma. 
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Table 2 Mutated genes and loci information for EGFR/KRAS-negat\ve lung adenocarcinomas from our cohort, the 
expansion cohort, and The Cancer Genome Atlas cohort 

Gene Mutation frequency Mutation frequency in Mutation frequency in Total AA substitution COSMIC Number of damage 



in our set (N = 16) expansion set (N = 40) TCGA set (N = 14) (N) v.65 algorithms 

TP53 - 6 of 40 2 of 14 8 of 70 K93R 0 4 of 4 

122_122del 0 n/a 

Y124C 0 4 of 4 

C203F 0 4 of 4 

G206S 0 4 of 4 

R209W 0 4 of 4 

E219* 0 2 of 4 

G295A 0 4 of 4 

SETD2 1 of 16 2 of 40 1 of 14 4 of 70 R839* - 1 of 2 

V1576F - 3 of 3 

Q1981* - 2 of 2 

K2067* - 0 of 1 

CSMD3 1 of 16 1 of 40 2 of 14 4 of 70 P667S 0 3 of 4 

M1440I - 3 of 4 

K1928N - 3 of 4 

Y2028C - 3 of 4 

PTPRC 2 of 16 1 of 40 - 3 of 70 Y444N 0 3 of 4 

T453M - 3 of 4 

T1176M - 4 of 4 

SYNE2 1 of 16 2 of 40 - 3 of 70 2579_2580del - n/a 

E3880* - n/a 

E3903K 0 3 of 4 

GRIN2A - 3 of 40 - 3 of 70 L307Q - 3 of 4 

N886S 0 Oof 4 

T1069M 0 3 of 4 

CDH10 - 2 of 40 1 of 14 3 of 70 El 71 K 0 4 of 4 

D315H 0 1 of 4 

R472C 0 4 of 4 

ERBB2 - - 3 of 14 3 of 70 E740delinsEAYVM (2) - n/a 

G746delinsVC O n/a 

SMAD4 - - 3 of 14 3 of 70 S242* - 1 of 2 

R361S O 4 of 4 

D493N O 3 of 4 

CDC27 2 of 16 - - 2 of 70 I235T - 4 of 4 

A273G O 3 of 4 

GABRG1 2 of 16 - - 2 of 70 D43N - 2 of 4 

T318S - 3 of 4 

YTHDF1 2 of 1 6 - - 2 of 70 I492V (2) - 4 of 4 

PCDHB14 2 of 16 - - 2 of 70 Y670S (2) - 2 of 3 

MECP2 1 of 16 - 1 of 14 2of70 R168L - 2of3 

G273V O 3 of 3 

NRAS - 2 of 40 - 2 of 70 Q61H O 2 of 4 

Q61K O 3 of 4 
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Table 2 Mutated genes and loci information for EGFR/KRAS-negat\ve lung adenocarcinomas from our cohort, the 
expansion cohort, and The Cancer Genome Atlas cohort (Continued) 



MET 






2 of 14 


2 of 70 


Y1021* 
1027_1028del 




2 of 2 
NA 


EPHA2 




2 of 40 




2 of 70 


P147L 
P278S 


O 
O 


4 of 4 
2 of 4 


CHD2 




2 of 40 




2 of 70 


S391fs 
splicing(c.5 1 53) 


O 


n/a 
n/a 


MST1 






2 of 14 


2 of 70 


406_407del 
R535G 




n/a 
3 of 4 


ERBB4 


1 of 16 






1 of 70 


V840I 


O 


4 of 4 


PIK3CA 


1 of 16 






1 of 70 


G118D 


O 


2 of 3 


AKT2 


1 of 16 






1 of 70 


D153Y 




4 of 4 


TSC1 


1 of 16 






1 of 70 


Q793* 




2 of 2 


CTNNB1 


1 of 16 






1 of 70 


S37C 


O 


4 of 4 


TGFBR2 


1 of 16 






1 of 70 


R504W 


O 


4 of 4 


PBRM1 


1 of 16 






1 of 70 


E830* 




2 of 2 


MBD2 


1 of 16 






1 of 70 


E373K 




4 of 4 


CHEK2 


1 of 16 






1 of 70 


R191G 




3 of 4 


PAD PA 


i or i o 






i or /u 


1 1 n^QT 
1 1 uoy i 


n 
u 


A /-vf A 
H OT H 


CUL3 


1 of 16 






1 of 70 


Q516* 




2 of 2 


S0D2 


1 of 16 






1 of 70 


K194E 




3 of 4 


GABRD 


1 of 16 






1 of 70 


D174E 




4 of 4 


NDRG1 


1 of 16 






1 of 70 


L257fs 




n/a 


TFG 


1 of 16 






1 of 70 


E240* 




1 of 1 


SYK 


1 of 16 






1 of 70 


A400S 




4 of 4 


ASPSCR1 


1 of 16 






1 of 70 


L252Q 
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PCDHB15 


1 of 16 






1 of 70 


Y670S 




3 of 3 



COSMIC, Catalogue of Somatic Mutations in Cancer; n/a, not available; TCGA, The Cancer Genome Atlas. 



from never-smokers were analyzed. Information on 
non-synonymous and splicing site mutations were 
extracted from a pooled dataset The median rate of non- 
synonymous mutations in EGFR/KRAS-negative never- 
smokers was approximately 0.65 mutations per megabase 
and the median number of non-synonymous mutations 
per patient was 19.0. The average ratio of transitions to 
transversions was 1.07 and G:C— >A:T transitions (40%) 
were the most frequent, consistent with our data. 

Comparison of altered genes among the three cohorts 
is shown in Figure 2. SETD2 and CSMD3 were altered 
in all three cohorts (Figure 2A). Commonly altered 
genes with information on affected loci, amino acid 
changes, and functional predictions are summarized in 
Table 2 (full information is provided in Additional file 1: 
Table S6). The most frequently mutated gene was TP53, 
which was altered in 11% of tumors, followed by SETD2 
(6%, 4 of 70 cases), CSMD3 (6%, 4 of 70 cases), and 
ERBB2 (6%, 4 of 70 cases). PTPRC, SYNE2, GRIN2A, 



CDH10, and SMAD4 were each altered in 3 of 70 cases 
(4%). SETD2 interacts with p53 and regulates genes 
downstream of p53 in addition to increasing p53 stabil- 
ity [22]. Mutations in SETD2 were nonsense mutations 
in three cases and missense mutation in one case. The 
missense mutation V1576F is located in the SET do- 
main; one nonsense mutation, R839*, is a truncating 
mutation upstream of the SET domain, and two non- 
sense mutations, Q1981* and K2067*, are truncating 
mutation upstream of the WW domain. In addition to 
known cancer driver genes such as ERBB2 (6% of cases), 
NRAS (3%), MET (3%), PIK3CA (1%), AKT2 (1%), TSC1 
(1%), and ERBB4 (1%), several putative cancer genes 
were identified, such as PTPRC [23], SYNE2 [24], 
GRIN2A [25], and CDH10 [26]. The mutation pattern is 
summarized in Figure 2B. 

Pathway analysis of 1,760 genes that were altered in 70 
EGFR/KR AS -negative tumors of never-smokers revealed 
alterations in genes related to DNA repair and the cell 
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cycle, including components of p53/ATM signaling, Gl/S 
or G2/M checkpoint regulation, and non-homologous end 
joining (Figure 2C). The most significantly enriched 
pathway was cAMP-dependent protein kinase A signal- 
ing, which can activate the mitogen-activated protein 
kinase cascade in lung adenocarcinoma [27]. Other 
enriched functions of altered genes were calcium trans- 
port (P = 0.006), axonal guidance (P = 0.015), and Ephrin 
A signaling (P = 0.031). 

Discussion 

The somatic mutation profile in lung adenocarcinomas 
lacking targetable EGFR or KRAS mutations or ALK rear- 
rangements in never- smokers is highly complex. Our ex- 
ome analysis of 70 tumors identified several common 
mutations involving the known cancer genes TP53, NRAS, 
ERBB2, PIK3CA, and CTNNB1, but also mutations in 
SETD2, CSMD3, PTPRC, and SYNE2 (Figure 3). 

SETD2 (mutated in 6% of cases) is a histone methyl- 
transferase that is involved in transcriptional elongation 
and chromatin remodeling. Interaction with p53 is facili- 
tated by the SET and WW domains and might increase 
p53 stability [22]. Interestingly, SETD2 and TP53 muta- 
tions were mutually exclusive in lung adenocarcinoma of 
never-smokers (Figure 2B). CSMD3 (mutated in 6% of 
cases) is a transmembrane protein with CUB and sushi 
multiple domains that is thought to function in protein- 
protein interactions and the immune response. Recent 
studies showed that loss of CSMD3 increases prolifera- 
tion of airway epithelial cells [9] and may be involved in 
tumorigenesis in lung cancer. In our study, missense 
mutations (P667S, M1440I, K1928N, and Y2028C) in 
CSMD3 were predicted to be deleterious to protein 
function. PTPRC (mutated in 4% of cases) is a member 
of the protein tyrosine phosphatase family and regulates 
a variety of cellular processes including cell growth, dif- 
ferentiation, and tumorigenesis. PTPRC regulates the 
JAK/STAT signaling pathway and functional defects can 
activate JAK/STAT signaling [23]. We observed three mis- 
sense mutations (Y444N, T453M, T1176M) in PTPRC, all 
of which were predicted to be deleterious. SYNE2 plays a 
role in cadherin-mediated cell-cell adhesion and regulates 
the Wnt signaling pathway [24] . 

We identified several targetable pathways in EGFR/ 
KRAS/ALK-negative lung adenocarcinoma including PI3K/ 
mTOR signaling (TSC1, PIK3CA, AKT2), receptor tyrosine 
kinase signaling (ERBB4), cell cycle regulation (CHEK2, 
CDC27), and DNA repair (PARP4). PI3K pathway in- 
hibitors and cell cycle inhibitors are actively under in- 
vestigation for lung adenocarcinoma in preclinical and early 
phase clinical trials [28,29]. A current mutation screening 
program for tailored targeted therapies is also on-going in 
1,000 patients with advanced lung adenocarcinoma based 
on 10 single driver mutations: KRAS (25%), EGFR (23%), 
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Figure 3 Significantly mutated genes in EGF/?/K/?/4S-negative 
lung adenocarcinomas from never-smokers (N = 70). 




ALK rearrangements (6%), BRAF (3%), PIK3CA (3%), MET 
amplifications (2%), ERBB2 (1%), MEK1 (0.4%), NRAS 
(0.2%), and AKT1 (<0.1%) [30]. Additional genomic alter- 
ations will be incorporated in a comprehensive manner 
based on next-generation sequencing data. 

More than 200 putative cancer-causing genes have been 
identified in recent genomic landscape studies using next- 
generation sequencing technology, and several cellular 
processes not previously implicated in cancer have been 
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revealed, such as chromatin remodeling, splicing, and 
ubiquitination [31,32]. We identified alterations in genes 
involved in chromatin remodeling (PBRM1, SETD2), oxi- 
dative stress (CUL3, SOD2), immune response (CSMD3, 
SYK), and gamma-aminobutyric acid receptor signaling 
{GABRD, GABRG1) in lung adenocarcinoma. Interest- 
ingly, although somatic mutation is rare in EGFRIKRASI 
AL/C-negative lung adenocarcinoma of never-smokers, the 
PCDHB14 (cell adhesion) Y670S mutation and YTHDF1 
(RNA binding) I492V mutations were each found in 
two cases (12.5%). Future studies to elucidate the role 
of these newly implicated functions in tumorigenesis 
are warranted. 

Conclusions 

We identified novel somatic mutations in EGFRIKRASI 
ALKT-negative lung adenocarcinoma in never-smokers and 
investigated the mutation frequency of altered genes. 
EGFR/KRAS/ALK-negative lung adenocarcinoma in never- 
smokers is highly heterogeneous at the somatic mutation 
level. However, most of the altered genes were involved in 
the cell cycle, and might represent novel therapeutic tar- 
gets in lung adenocarcinoma. Future research on the func- 
tional role of chromatin remodeling, oxidative stress/ 
differentiation, and the immune response will enhance our 
understanding of the mechanisms of tumorigenesis. 
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